On the Centum Features of Thraco-Dacian Language 


Abstract: The centum/ satem distinction refers to the nature of the first two dialects that appeared in 
Proto-Indo-European, namely the different evolution of Proto-Indo-European palatal velars *k , *g and 
*o'h. The two dialects, the western one was named centum and the Eastern one satem. Following this 
distinction, we can demonstrate that the Traco-Dacian language is a centum, not satem language as it 
was believed since 19th century to the present. The Romanian lexical elements of Thraco-Dacian 
origin, discussed in this article have centum, not satem features which proves that this language was a 
centum language and even related to Latin and other Italic languages. Furthermore, we may easily 
distinguish the genuine Slavic lonawords into Romanian and vice-versa. Until now the Romanian 
linguists were completely unaware of these details. 
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There has been a long time ago mostly since the end of 19 century, but especially from the 
1990s ofthe last century, a dispute in Romania between the supporters of the Latin origin and those of 
the Thraco-Dacian origin of the Romanian language. Although I am a convinced supporter of the 
Thraco-Dacian origin of the Romanian language, I do not consider myself a part of any of the two 
groups, since both are mistaken in one way or the other. The Latinists are mistaken in exaggerating the 
contribution of the Latin to the formation of the Romanian language, and the Dacicists are wrong in not 
knowing linguistics and every one is bumping the plains in their own way. I have been concerned with 
the problem of the origin of the Romanian language since I was university student. My intuitions were 
materialized much later, after long researches, in a number of books and papers, including an 
Etymological Dictionary of Romanian Language based on Indo-European Studies, since I realized that 
the Latin hypothesis of the origin of the Romanian language is an extremely limited perspective which 
leaves too many questions unanswered. In short, Latin is only one language in a wider group of related 
languages, a fact which was not understood for a long time and still misunderstood not only by 
Romanian linguists, but among many other researchers of Romanace languages. The fact is interesting 
and deserves to be discussed in more details, but the purpose of this article is to show that, despite 
some appearances, Thraco-Dacian was a centum language related to the Italic languages, including 
Latin and not a strange very little known satem language. According to my data (see DELR) a rather 
small number of lexical items have correspondents in Latin (around 1496, most of them cognates of 
Latin), but much larger number of 62% have cognates in other Indo-Euroepan languages (others than 
Latin) and can be connected to various Proto-Indo-European roots. The rest are either imitative 
formations (6%), loanwords from various languages (around 12%) and of uncertain origin (6%). Many 
of the loanwords are marginal lexical itmes and most of them with no derivatives. 

The idea that Thraco-Dacian belongs to the satem group appeared in the 19th century due 


to poor knowledge of this language and the lack of interest in Romanian as a possible repository of 
Thraco-Dacian words and their phonological features. I mention that Thracian, Illyrian and Dacian are 
different names for the same language. Furthermore, some linguists consider Illyrian a centum 
language, while Thracian (and Dacian) as satem. All this is a non-sense since both these names define 
one language. There is no doubt that they may have been dialectal differences, but in the early Middle 
Age there was only one language spoken from the Adriatic Sea to the Black Sea and further to the East, 
It was Proto-Romanian which must have one single ancestor, before the invasions of Slavs and the 
Magyars in the last centuries of the 1“ millenium AD, no matter wether we consider a process of 
Romanization or not. Even today there are pockets of speakers of Romanian all over the Balkan region 
and east and north of Romania and Republic of Moldova in Poland and Ukraine. Suidas (10 Century 
AD) defines the Illyrians as Barbarian Thracians (Illyrioi barbaroi Thrakoi), while Strabo (1st century 
AD) shows that Thracians and Illyrians spoke the same language. For the sake of simplicity, in this 
article I will use the term Thraco-Dacian instead of Thraco-Illyro-Dacian. There are a lot of 
controversial hypotheses based on dubious evidence of insufficiently corroborated data regarding 
Illyrian, Thracian and Dacian, but I will not discuss them here. Instead, I draw the evidence only from 
the lexicon of contemporary Romanian. 


The 19th century Indo-Europeanists deduced that Indo-European languages can be divided 
into two main groups, namely the centum group, located in the western Indo-European linguistic area 
and the satem group located in the eastern part of this area. The terms defines the number hundred 
(100) in Latin (centum) and respectively in Avestic (satem) as it is attested in Zoroastrian religious 
scriptures. 

Unfortunately, in Romanian linguistics there is no interest in this subject, not even at the highest levels. 
Although the opinions of different linguists regarding the nature and the number of Proto-Indo- 
European velars varied over the years, but one may say the Proto-Indo-European language had three 
unvoiced velars: *k, *k', *kW and and six voiced velars. Three unaspirated: *g, *gW, *g' and three 
aspirated: *gh, *gWh, *g'h. Of these, we are particularly interested in the palatal velars *k', *g', *g'h. 
The palatal velars turned into simple velars in Thraco-Dacian as in any other centum language, while 
*kW and *gV had a very interesting evolution in Thraco-Dacian, similar to that of the Continental Celtic 
languages, Osco-Umbrian and northern Greek dialects. Namely, the labio-velars turned into bi-labials 
after a back vowel (a, o) and turned into an affricate (or sibilant) after a front vowel (e, 1), while in 
Osco-Umbrian and Continenetal Celtic languages all labio-velars turned into bi-labials regardless the 
phonogical environment. Details are discussed in the Introduction to the DELR. 

Returning to the first two dialects of Indo-European language, the sounds *k' and *g' may 
turned into affricates first, then into sibilants (š, $ or s) in the Eastern dialect from which the Indo- 
Aryan, Slavic, Baltic, as well as Armenian originate from. Many linguists include Albanian to the 
satem group, but Albanian is rather centum than satem, but it is a bit ambiguous in this respect. A 
thorough study ofthe Albanian language in this regard would be of particular interest, but it seems to 
me that no one did it so far. On the other hand, there is some evidence that Albanian and Armenian 
show a separate treatment of all three rows of the Proto-Indo-European velars. In Albanian, Proto-Indo- 
European *kW and gV are different form *k, *g before a front vowel and same rule applies to 
Armenian. It seems to be a parallel development in the two languages. In other words, these two 
languages followed their own way, different form the satem group as well as from the centum group. 
In my previous works, I showed that Albanian is not a descendent of a main Illyrian dialect, but it is a 
descendent of Epyrotic dialects which were different since ancient times (see DELR, Introduction). 

In the Proto-Indo-European western dialect, the palatal velars *k' and *g' colapsed with 
their non-palatal counter-parts *k and *g. To the centum group belong Italic (including Latin), Greek, 
Celtic and Germanic languages. According to the data presented here, the Traco-Daco-Illyrian 
language must also be included here. Although Tocharian was spoken somewhere in today's western 


China, it was a centum language. It seems that Tocharians migrated from the centum area, after the first 
centum/ satem division, perhaps somewhere from central Europe. Some also include the Hittite to the 
centum group, but in fact, its situation is a little bit more complicated. The Hittite belongs to the 
Anatolian group along with Lydian, Luwian and other languages spoken in Anatolia. Today, more and 
more linguists believe that Proto-Anatolian was a sister of Proto-Indo-European, not its daughter. I 
think this is the correct position regarding the Anatolian languages. There is more evidence in this 
respect, but I will just mention here that Lydian preserved the Proto-Indo-European velars, as presented 
above, being more conservative in this respect than the Hittite, while Hittite, followed the same process 
as in the Indo-European centum group, but completely independently. 


The forms for 'dog' in most Indo-European languages are derived from PIE *k'uon-, *k'un- 
'dog' IEW, 632). Romanian noun caine 'dog' has a correspondent form in Latin and is therefore 
considered to be of Latin origin. However, it is attested in the Dacian plant name kinoboula (or 
kinoboila), found in Dioscorides, and in Apuleius the same plant appears as cinubula (or dinupula), a 
plant associated with the dog's sexual organ. There are a number of forms of this kind in today's 
Romanian language, forms that I have discussed in DELR (see câine in DELR, 200). As one can see, in 
the first part of this compound word kino- which is considered to mean 'dog' we have a velar (k) and 
not a sibilant as it appears in satem languages: cf. Armenian sun 'dog', Old Prussian sunis 'id', Mid. 
Persian sak 'id', Russian suka 'she-dog', etc. In Albanian we have gen for male dog, a centum form, but 
shaké for female dog, a satem form. It is an interesting phenomenon, but we will not discuss it here. 
Linguists, of course, are saying that gen is of Latin origin, while shaké is inherited from Proto- 
Albanian. However, Albanian has other words which have centum features and cannot be of Latin 
origin (see bellow). In addition, the Albanian is situated in the centum area. 

In the case of the Romanian words of this category, it is difficult to 'prove' that they are of 
Dacian origin when they have a Latin correspondent due to the deeply rooted misconception that 
Romanian is a daughter language of Latin, but also because of the mistaken hypothesis that Traco- 
Dacian was a satem language, much different from Latin. Although, there is a number of such words 
in Romanian that do not have a Latin correspondents, but they clearly exhibit centum features and 
therefore they are a living testimony of the centum nature of the Thraco-Dacian language. 
Sometimes there are whole families of Romanian words originating from the same Proto-Indo- 
European root, all of them having centum features. 


This is the case with PIE *k'es- 'to cut', with the nominal derivative *k'estro-m 'knife 
cutter' (IEW, 586). Romanian nouns cosor 'pruning knife', coasá 'scythe' are derived from the verbal 
root, while custurá 'knife blade, rudimentary knife', a cresta 'to notch, to dent, to wound' and creastá 
'crest, ridge' are derived from the reconstructed nominal form. From this root, more precisely from the 
nominal form is derived Latin castro, castrare 'to castrate' (cf. IEW, 586; de Vaan, 97) which comes 
close the the Romanian verb a cresta. In all Romanian etymological dictionaries the first three are 
considered to be Slavic loawords, and the fourth is considered to be of Latin, as a derivative of the 
noun creasta, but such an hypothesis cannot be correct. If there is a genetic link between the noun 
creastá and the verb a cresta, then creasta is derived from the verb a cresta and not vice versa. It is a 
legitimate question to ask how can these words can be considered to be of Slavic origin, since they 
have centum features while Slavic languages are satem languages? Therefore, due to the lack of 
elementary knowledge of Indo-European linguistics, such aberrations become possible and do not 
bother anyone. Even Julius Pokorny, when discussing this Proto-Indo-European root, is wondering why 
Slavic kosa 'scythe' displays a velar, not a sibilant as it should have been in a Slavic language ('k-statt 
s-durch Dissimil. Gegen das folgende s?' 'k instead s by dissimilation from the next s?'). Of course, the 
great linguist did not think of a possible Proto-Slavic loanword from some centum Indo-European, as 
we do. 


Romanian noun crai 'king' is also considered to be of Slavic origin, namely from OCS 
kral' 'king', in its turn from Germanic personal name Karl, in reference to Charlamagne. In my 
oppinion this is a bizzarre hypothesis, accepted by Slavicists for lack of some better explanation. 

In DELR, I have shown that Romanian crai is derived from PIE *k'rei — 'to be in front, to excel' IEW, 
618). The meaning of today's Romanian crai comes very close the Homeric Greek kreion, found only 
in the Iliad, being a poetic form, as in the expression kreion Agamemnon 'king Agamemnon'. At 
Homer, it is also found only once the feminine form Areiousa 'queen' (Iliad 22, 48: cf. Liddel, 993), 
referring to one of Priam's wives, which is a strong argument that the form may be of Trojan or 
Thracian origin, a form that is also almost identical to today's Romanian cráiasá 'queen'. These forms 
are not found in any of the Classical Greek texts, as I have said, with the exception ofthe Dorian 
kreioisa (Theoc., cf Liddel, 993). Given this information, one may say that Greek forms are loanwords 
from Thracian dialects. 

Robert Beekes (EDG, 1, 774) shows that the Greek form is inherited from the Indo- 
European poetic language, which is correct, but does not suggest that Greek may have borrowed it from 
another Indo-European language, since he did have any knowledge about Romanian data I just 
presented here. On the other hand, I have no doubt that the Thracians who participated in the Trojan 
war, along Trojans, had their own version of Iliad of what happened in Troy. Moreover, in today's 
Romanian, the nouns crai and cráiasá are poetic forms as well, being found in fairy tales, in 
allegorical ballads, but also to the great modern poets like Mihai Eminescu. In other words, one may 
say Romanian language inherited these words from the Proto-Indo-European poetic language as well. 
The relationship with the Slavic and Hungarian forms is not clear to me. They may be loanwords from 
Romanian with an epenthetic 1, as in the case of boier 'aristocrat' > OCS boljar 'id' (or toiag 'stuff > 
OCS toljag 'id") or from Germanic personal name Karl, as most Slavicists believe (see DELR, crai). 

Another group of Romanian words, namely colibá 'cottage', cuib (cuibar) 'nest', cáltun 
‘winter sock' and probably soric, sor or cioric (fried) pork skin' are derived from PIE *Kk'el- 'to cover', 
and nominal forms *k'olia, *k'elo-s 'roof, cover' (IEW, 553). 

Romanian noun colibá is considered by the old school of Romanian linguistics to be of Bulgarian 
origin since Miklosich (19th century), but this term is derived in fact from this Indo-European root and 
inherited from Thraco-Dacian, not a loanwords from Old Church Slavonic or Bulgarian, since it has 
centum features. The term is found in all Balkan languages, as well as in Turkish, Hungarian and 
Ukrainian. Therefore, the forms of all these languages cannot be loanwords from Bulgarian, because 
the Bulgarian was not in any contact with Hungarian and Ukrainian, but the Romanian did and does. 
On the other hand, in Pausanias (Description of Greece) (2nd centuy, AD) is found the place name 
Kolibe, located somewhere in northern Greece, and therefore in a Thracian area. Such a form does not 
exist in ancient Greek. In contrast, the equivalent form in Greek is kalia ‘hut, nest', a cognate of the 
Romanian forms. Not to mention that the first Slavs arrived in the Balkan region several hundred years 
after Pausanias wrote his book. 

It is believed that the noun cuib 'nest' comes from the unattested Latin *cubium « cubere 'to 
lie down'. If there was a bit of truth in this hypothesis, we would have had in Romanian *cub or *cubiu. 
In addition, the one who delivered this hypothesis (Cihac, 19" century) and those who followed him 
ignored the forms of the Romanian Balkan dialects spoken in Bulgaria, Greece, Albania and parts of 
former Yugoslavia: cf. Aromanian cul'bu, Megleno-Romanian, Istro-Romanian cul'b which cannot be 
explained by this presumptive Latin etymon. These data disprove the hypothesis in question. So both 
the form and the sense of Romanian cuib indicates a completely different origin, which refers to one of 
the above-mentioned nominal forms, with a additional 5 sound, *kulibu, as in the case of coliba. 

The Romanian caita 'bonnet' is considered to be a Serbo-Croatian loanword, a hypothesis issued by 
Cihac and taken over by all the other authors of etymological dictionaries. However, I did not find such 
a form in the Serbo-Croatian. This Romanian noun is derived from the nominal form of the root, which 


has been suffixed with -ita, thus *kalita with a later palatalization and the subsequent disappearance of 
the lateral /, as in the case of the cuib. 

The nouns soric '(fried) pork skin', sor 'id' or cioric 'id' (attested in Republic of Moldova) 
seem to be derived from the nominal form *k'elo-s. The affricate s (sh) can be explained by the fact 
that the velar is followed by a front vowel, a development which took place in Thraco-Dacian. It is a 
phonological phenomenon that appeared in Thraco-Dacian and it is found to all velars (and dentals as 
well) regardless of their status in Proto-Indo-European. This state of affairs has been inherited in 
Romanian. This kind of palatalization has made linguists to consider the Thraco-Dacian language as a 
satem language, due to a poor understanding of the nature of this language. This error I hope will be 
finally corrected by the Romanian examples, discussed in this article. The lateral l turned into r as a 
result of rhotacism (17 r), another fenomenon frequently found in Romanian. The Romanian 
Explicative Dictionary (DEX) and other Romanian etymological dictionaries consider these forms to be 
of unknown origin. 

From PIE *k'erdho-, *k'erdha 'herd, flock' (IEW, 579) there are in Romanian three 
different forms with slightly different meanings, dialectal variants: cárd 'flock (of sheep), flight 
(dial.)), ciurdá 'herd (of cattle), crowd' and cireada ‘herd (of cattle)'. It seems to me that card is 
derived from *k'erdho-, of neuter gender in Romanian, while the other two come form *k'erdha, both 
of feminine gender in Romanian and presumably in Proto-Indo-European as well, judging by the a- 
suffix. One may notice they have slightly different meanings and also they may originally belong to 
different dialects of older Romanian (Thraco-Dacian), but they are mostly understood and used by all 
native speakers of Romanian. 


In the case of the form cárd, it is obvious that palatal velar *k has turned into the regular 
velara k. It is unnecessary to insist that the old hypothesis which claims that Romanian cárd to be a 
loanword from Serbo-Croatian is completely absurd and as such must be eliminated like many others 
of its kind. The form ciurdá is derived from the root form *k'erdha, but a palatalization of k followed 
by a front vowel, as I have shown above. It cannot be of Slavic origin since there was no metathesis of 
the lateral r as in as in OCS Creda, the inherited form of the Slavic languages from the same Indo- 
European root. The metathesis of lateral sounds took place back in Proto-Slavic language and it is 
present in all Slavic languages. One may not exclude the fact that the form cireadá may have emerged 
under the influence of some Slavic dialect, but this remains a simple hypothesis. The details are not 
discussed here. It is also posible that the form has entered Proto-Slavic as a loanword from Thraco- 
Dacian, since éreda has an affricate, not a sibilant (s) as it supposed to be and it is in other satem 
languages; cf. Sanskrit Sardha 'flock, herd', Avestan saroda 'tribe, genus'. These assumptions remain 
open, and more in-depth research is needed. 


Another example is Romanian noun cracá (creangá) 'branch' which is derived from PIE 
*k'ak- 'branch' with the nasal form *k'ank- 'id' (IEW, 523). In the case of the Romanian language (or 
Thraco-Dacian), there was an epenthesis of the lateral r probably to avoid homonymy with cac, cacá 
'to defecate'. 


A particular case is Romanian noun cátaná (cataná) 'soldier' which is considered to be of Hungarian 
origin, but this hypothesis is not correct, since the form seems to come from an Indo-European root, 
namely PIE *k'at- 'to fight', *k'atu-, *k'at-(e)-ro- 'fight' (IEW, 534). The radical is encountered in 
several groups of Indo-European languages, being better represented in Celtic languages: cf. Galic 
Catu-rix, Old Irish cath '1. fight; 2. band, crowd'. In other words, if Romanian cátaná is derived from 
this root, it has centum features, but one can not say exactly if it is a Thraco-Dacian word or it may be a 
Celtic loanword, especially since the form is present only in Transylvania and Banat (as well as in 
Pannonia/Hungary), where the Celtic influence was much higher. It is well known that the Celtic tribes 
of Boii and Taurisci lived for a long time in south-eastern Pannonia until they were definitively 


vanquished by Burebista, the great king of Dacia, in the 1st century BC. In case it is of Celtic origin, 
this example can not be a proof of my demonstration, but it is relevant of Celtic influence on Dacian, 
prior to Roman times. 


Finally, Romanian noun caramb is today considered by everyone to be of Thraco-Dacian origin. It is 
derived from PIE *k'olemo-s , *k'olema 'stalk, reed' IEW, 612). Among other things, it was associated 
with Latin calamus which is a cognate from the same Proto-Indo-European radical (cf. DELR, 183). 
From the same root is derived caraba 'flute, the pipe of a pipe-bag, tibia’ for which no other plausible 
etymology was found. 


Other Romanian words such as a cádea 'to fall', corn ‘horn’, car 'cart', a curge 'to flow' are in the same 
situation as the example discussed above, but I do not go into details since these forms have 
correspondents in Latin. 


Regarding the evolution of the Proto-Indo-European voiced palatal velar *g' the situation is practically 
identical to its voiceless counterpart, it turned into a regular velar (g) being preserved as such in 
Thraco-Dacian and Romanian as well, as shown in the examples below. 


The Romanian forms a grai 'to speak', grai 'speech, dialect', as well as gura 'mouth', as well as 
hármáilaie ‘uproar, noise' and gară ‘slander, noisy crowd' are all derived from PIE * g'ar- 'to call, to 
scream’, with the nominal forms *g'aro, *g'ara, *garmo- ‘call, lamentation’ (IEW, 352), with cognates 
in several of Indo-European groups of languages; cf. Sanskrit gur 'to call', ud-gur 'to raise the voice’, 
Ossetian zarun 'to sing’, zar 'song', Greek yrjpoc 'voice', Latin garrio 'to talk, to slander', Old Irish gar 
'to call’, Welsh gair 'word' (cf. DELR, 404). Most Romanian linguists believe the verb a grăi is a 
loanword from Serbo-Croatian grajati 'to croak', which is not only ridiculous, but also an affront to 
Romanian language and spirituality. The hypothesis was issued by Miklosich and taken over by all 
linguists so far until today. 


Instead, the noun gurá 'mouth' was given a Latin origin, namely from Latin gu/a 'throat' a wrong 
etymology, since the two forms do not fit semantically. However, in Latin there is the verb garrio 'to 
speak, to chatter' and garrulus 'talkative person', much more compatible with Romanian gurá 'mouth' 
(also Romanian guraliv 'talkative person") and almost identical to Romanian a grăi. But these forms 
did not attract the authors of the etymological dictionaries of the Romanian language. The verb a grai 
cannot be derived from Latin garrio, but there is no doubt thay are cogntaes. According to Cihac and 
Cioranescu, Romanian noun gará is a loanword from ... Polish. It seems to be a direct descendent of 
PIE *g'ara , while hármalaie is a descedent of *g'ar-mo-, where g > h. (This fenomenon is found in 
other Romanian words such as horn 'chimney' > PIE gWhorno-s 'chimeny'; cf. Latin fornus). There are 
also cognates in Greek and other Indo-European languages (see DELR, 376, 404, 411). 


Romanian gáscá 'goose' and gánsac 'male goose' are derived from PIE *g'hans- 'goose' (IEW, 413). 
Cognates are found in most of the Indo-European languages (cf. DELR). Romanian forms for 'goose' 
have been given various Slavic origins. It should be noted that Slavic forms have centum characteristics 
(cf. OCS. gosu 'goose') as opposed to Lithuanian Zasis 'id' which is a satem type form. Therefore, it 
seesm that the Slavic form is a loanword from the Proto-Romanian (or Thraco-Dacian). Romanian 
form for 'goose' originates from an earlier form gansa < *gansa. 


The verb a zgária 'to scratch' are derived from PIE *g'her- 'to scratch' (IEW, 441) with cognates only 
in in Greek and Lithuanian. The Lithuanian Zeriu, Zeri to scratch' has satem characteristics. I have to 
mention that the Baltic languages are less satem than Slavic languages, which, in my opinion, it proves 
that the Baltic languages have had older and more intense contacts with speakers of Thraco-Dacian 
than the Slavs who, in their turn, have borrowed from this language, loanwords that now bear a centum 
mark (see Argument, DELR). 


The noun gard 'fence', gradina (gardiná) 'garden' and gardin '1. fence; 2. edge' as well as grádiste 'the 
place or remnants of an ancient fortress or city' are derived from PIE *g'her-, *g'herdh- 'fence', with 
the nominal form g'horto-s 'fenced place' (IEW, 442). 


For Miklosich, Romanian gard 'fence' is a loanword from OCS gradi 'city', but recently, Romanian 
linguists accept the idea that it is of Thraco-Dacian origin by comparison with Albanian garth 'fence', 
just because the Albanian form phonologically cannot be of Slavic language. Here Albanian behaves 
like a real centum language. I have also shown on other occasions that OCS gradu is actually a 
loanword form Thraco-Dacian precisely because this form has centum characteristics. As I just 
mentioned Romanian grádiste defines an old city and there are many place names in today's Romania 
called Grádiste. 


Polish linguist Z. Golab shows that Slavic and Baltic languages have parallel forms deriving from the 
same Proto-Indo-European root, which define similar notions, but have satem features; cf. Lithuanian 
Zardas ‘an wooden construction', Latvian, Old Prusian sardis 'horses pen', OCS Zurdu 'hen coop', 
Russian Zerd 'id' as comparing with centum type ones: cf. OCS gradu 'city', Russian gorod 'id', Polish 
grod 'id', OCS gorditi, graditi to bulid', etc. (see DELR, Introduction, 28). 


M. Vinereanu 
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